Rot(m)(g) batched and strided_batched#737
Merged
Merged
Conversation
TorreZuk
approved these changes
Oct 4, 2019
Contributor
TorreZuk
left a comment
There was a problem hiding this comment.
Approved with changes required. This PR is probably too big. Probably should have been done in two or more PR, one for each function. That way the design pattern is approved before going to far.
| 0); | ||
| else // array of params on host, copy to device | ||
| { | ||
| // This should NOT happen from calls from the API currently. |
Contributor
There was a problem hiding this comment.
I vote we don't support this.
Contributor
Author
There was a problem hiding this comment.
This has been addressed in 474e196. We now support:
- batched
paramfor rotm(g)(strided)batched. paramcan be located on either host or device for original rotm, and for all rotmg (host version of rotmg is entirely carried out on the host)paramcan only be located on device for (strided_)batched versions of rotm.
TorreZuk
approved these changes
Oct 7, 2019
amcamd
approved these changes
Oct 7, 2019
| n rocblas_int | ||
| number of elements in the x and y vectors. | ||
| @param[inout] | ||
| x array of pointers storing vector x on the GPU. |
Contributor
There was a problem hiding this comment.
for c and s comment has "host or device memory", for x and y the comment has "on the GPU". It would be better to be consistent, say "in device memory" in place of "on the GPU"
| n rocblas_int | ||
| number of elements in the x and y vectors. | ||
| @param[inout] | ||
| x pointer storing strided vectors x on the GPU. |
Contributor
There was a problem hiding this comment.
see above comment on consistency of description
wbgilmartin
pushed a commit
to wbgilmartin/rocBLAS
that referenced
this pull request
Oct 9, 2019
* Changed timeout from hours to minutes (ROCm#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (ROCm#701) * SLES support (ROCm#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (ROCm#695) * Fixing Timeout * BF16 replacement kernels (ROCm#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (ROCm#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (ROCm#708) * Batched syr (ROCm#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (ROCm#719) * Refactor Ger and Gemv (ROCm#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (ROCm#737) * SWDEV 203994 (ROCm#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (ROCm#742)
wbgilmartin
pushed a commit
to wbgilmartin/rocBLAS
that referenced
this pull request
Oct 14, 2019
* Changed timeout from hours to minutes (ROCm#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (ROCm#701) * SLES support (ROCm#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (ROCm#695) * Fixing Timeout * BF16 replacement kernels (ROCm#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (ROCm#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (ROCm#708) * Batched syr (ROCm#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (ROCm#719) * Refactor Ger and Gemv (ROCm#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (ROCm#737) * SWDEV 203994 (ROCm#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (ROCm#742) * Fix GEMM for half type
wbgilmartin
pushed a commit
to wbgilmartin/rocBLAS
that referenced
this pull request
Oct 16, 2019
* Changed timeout from hours to minutes (ROCm#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (ROCm#701) * SLES support (ROCm#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (ROCm#695) * Fixing Timeout * BF16 replacement kernels (ROCm#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (ROCm#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (ROCm#708) * Batched syr (ROCm#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (ROCm#719) * Refactor Ger and Gemv (ROCm#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (ROCm#737) * SWDEV 203994 (ROCm#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (ROCm#742) * Fix GEMM for half type * Refactoring classes to be simpler * Fix rocblas_half
wbgilmartin
pushed a commit
to wbgilmartin/rocBLAS
that referenced
this pull request
Oct 17, 2019
* Changed timeout from hours to minutes (ROCm#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (ROCm#701) * SLES support (ROCm#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (ROCm#695) * Fixing Timeout * BF16 replacement kernels (ROCm#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (ROCm#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (ROCm#708) * Batched syr (ROCm#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (ROCm#719) * Refactor Ger and Gemv (ROCm#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (ROCm#737) * SWDEV 203994 (ROCm#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (ROCm#742) * Changing Gemv and Ger stride type (ROCm#747) * Handle spaces and newline (ROCm#748) * Fix GEMM for half type * Tuned Shakespeare kernels (ROCm#749) * Refactoring classes to be simpler * Fix rocblas_half
wbgilmartin
pushed a commit
to wbgilmartin/rocBLAS
that referenced
this pull request
Oct 21, 2019
* Changed timeout from hours to minutes (ROCm#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (ROCm#701) * SLES support (ROCm#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (ROCm#695) * Fixing Timeout * BF16 replacement kernels (ROCm#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (ROCm#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (ROCm#708) * Batched syr (ROCm#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (ROCm#719) * Refactor Ger and Gemv (ROCm#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (ROCm#737) * SWDEV 203994 (ROCm#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (ROCm#742) * Changing Gemv and Ger stride type (ROCm#747) * Handle spaces and newline (ROCm#748) * Fix GEMM for half type * Tuned Shakespeare kernels (ROCm#749) * Refactoring classes to be simpler * Fix rocblas_half * Cleanup source
wbgilmartin
added a commit
that referenced
this pull request
Oct 21, 2019
* integration of the new tensile client * New tensile client (#1) * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * Map values to value categories currently represented as double * minor changes * merge Bill & Lee's changes * Merge Bill & Lee's changes * Fix build errors * Merge 2.10 develop into new tensile client (#2) * Changed timeout from hours to minutes (#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (#701) * SLES support (#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (#695) * Fixing Timeout * BF16 replacement kernels (#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (#708) * Batched syr (#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (#719) * Refactor Ger and Gemv (#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (#737) * SWDEV 203994 (#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (#742) * fix for changes of FreeIndices change in Tensile * New tensile client (#3) * Changed timeout from hours to minutes (#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (#701) * SLES support (#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (#695) * Fixing Timeout * BF16 replacement kernels (#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (#708) * Batched syr (#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (#719) * Refactor Ger and Gemv (#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (#737) * SWDEV 203994 (#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (#742) * Fix GEMM for half type * updates to get rocblas-test and half sizes to work * partial fix for NaN test failures * New tensile client (#4) * Changed timeout from hours to minutes (#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (#701) * SLES support (#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (#695) * Fixing Timeout * BF16 replacement kernels (#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (#708) * Batched syr (#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (#719) * Refactor Ger and Gemv (#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (#737) * SWDEV 203994 (#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (#742) * Fix GEMM for half type * Refactoring classes to be simpler * Fix rocblas_half * fix negative workgroup mapping error * fix WorkGroupMapping issue for files in asm_lite * more fixes for workgroupmapping issue * wgm issue for asm_miopen * New tensile client (#5) * Changed timeout from hours to minutes (#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (#701) * SLES support (#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (#695) * Fixing Timeout * BF16 replacement kernels (#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (#708) * Batched syr (#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (#719) * Refactor Ger and Gemv (#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (#737) * SWDEV 203994 (#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (#742) * Changing Gemv and Ger stride type (#747) * Handle spaces and newline (#748) * Fix GEMM for half type * Tuned Shakespeare kernels (#749) * Refactoring classes to be simpler * Fix rocblas_half * fix argument validation in gemm calls * fix complex strided batch implementation * fix validateArgs redefinition * New tensile client (#6) * Changed timeout from hours to minutes (#699) * set clang include directory, fix for centos build error * hot fix to restore loading of DGEMM replacement kernels (#701) * SLES support (#704) * Merging master with SLES commit * Specifying GPU architecture for ubuntu and sles (#695) * Fixing Timeout * BF16 replacement kernels (#705) * hot fix to restore loading of DGEMM replacement kernels * Revert "Switch to using separate D for gemm_ex benchmark calls (#667)" This reverts commit 402d231. * bf16 kernels for gfx908 * use bf16 UseBeta=0 replacement kernels * update tensile_tag to use bf16 UseBeta=0 replacement kernels * Restore usebeta1 logic (#707) * restore UseBeta=1 logic for arcturus BF16 TN * Supporting clang10 for SLES (#708) * Batched syr (#727) * adding syr batched and strided batched * rocblas_stride, reusable template pattern work * WIP * fixes testing and format * restore dependency * adds minimal batches & bad arg * fix bad arg testing * constify ptrs, spelling * add alpha vector support, PR feedback * more BF16 TN sizes * Refactoring * Move tensile_host.hpp from public C API to private C++ implementation * Work around missing complex; fix formatting * Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API * gf908 BF16 TN 512x512x512 known issue * Enable SLES packaging (#719) * Refactor Ger and Gemv (#735) * Map values to value categories currently represented as double * Rot(m)(g) batched and strided_batched (#737) * SWDEV 203994 (#743) * Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit * Fixes * update * version for master branch release * version for develop branch release * update Tensile package number * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741) * Fixing SLES tests LD_LIBRARY_PATH and refactoring tests * Missing braces/spelling * spelling * Fixing packaging * New Winograd kernels added (#742) * Changing Gemv and Ger stride type (#747) * Handle spaces and newline (#748) * Fix GEMM for half type * Tuned Shakespeare kernels (#749) * Refactoring classes to be simpler * Fix rocblas_half * Cleanup source
mlse-lib-jenkins
pushed a commit
that referenced
this pull request
Jun 11, 2021
ROCm 4.4 merge staging into master
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the following functions:
Edit:
Does not support batchedparamin rotm and rotmg from API. Supports batchedparamat _template level. Unsure if we want to support batchedparamon the host as this forces a hipMemcpy in the _template function.DOES support batched
paramin rotm and rotmg from API. Does NOT supportparamlocated on the host for (strided_)batched versions of rotm.Testing
The testing times for all rot functions, along with Andrew's estimations for 4 BLAS1 functions testing time follows: